Skip to content

fix: report disk usage for datadir mount point#8913

Closed
frederik12321 wants to merge 1 commit into
sigp:unstablefrom
frederik12321:fix/datadir-disk-usage
Closed

fix: report disk usage for datadir mount point#8913
frederik12321 wants to merge 1 commit into
sigp:unstablefrom
frederik12321:fix/datadir-disk-usage

Conversation

@frederik12321

Copy link
Copy Markdown

Summary

  • Pass the datadir path to psutil::disk::disk_usage() instead of hardcoding "/", so that disk_node_bytes_total and disk_node_bytes_free reflect the filesystem containing the data directory
  • Update all three consumers: /lighthouse/health HTTP endpoint, remote monitoring, and Prometheus /metrics scraping
  • Backwards compatible — falls back to root filesystem when data_dir is unavailable

Motivation

Operators who mount a separate drive for beacon chain data see incorrect disk stats because the health metrics always report the root filesystem. This affects remote monitoring dashboards and alerting.

Closes #6687

Changes

Core fix (common/health_metrics/src/observe.rs):

  • Add observe_health_with_data_dir() and observe_system_health_with_data_dir() free functions that accept a data_dir: &Path parameter
  • Existing Observe trait impls delegate to the new functions with "/" as fallback
  • psutil::disk::disk_usage(data_dir) automatically resolves the path to its containing filesystem

Prometheus metrics (beacon_node/http_metrics, validator_client/http_metrics):

  • Add data_dir: Option<PathBuf> to metrics Context structs
  • Pass data_dir to scrape_health_metrics_for_data_dir() when available

HTTP health endpoint (beacon_node/http_api, validator_client/http_api):

  • Pass data_dir to observe_health_with_data_dir() in /lighthouse/health handlers

Remote monitoring (common/monitoring_api):

  • Add data_dir: Option<PathBuf> to Config and MonitoringHttpClient
  • Use observe_system_health_with_data_dir() when data_dir is available

Config propagation (beacon_node/src/config.rs, validator_client/src/config.rs, beacon_node/client/src/builder.rs, validator_client/src/lib.rs):

  • Thread data_dir through to monitoring config and metrics context

Also fixes a minor typo in an error message: "Unable to disk usage info""Unable to get disk usage info".

Test plan

  • cargo check --workspace — zero errors, zero warnings
  • cargo fmt --all -- --check — clean
  • cargo clippy --workspace — no new warnings
  • Manual verification: start a beacon node with --datadir /some/separate/mount and confirm /lighthouse/health returns disk stats for that mount instead of root

🤖 Generated with Claude Code

@cla-assistant

cla-assistant Bot commented Feb 28, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


frederik seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Pass the datadir path to `psutil::disk::disk_usage()` so that
disk metrics (`disk_node_bytes_total`, `disk_node_bytes_free`)
reflect the filesystem containing the data directory, not the
root filesystem. This fixes incorrect reporting for operators
who mount a separate drive for beacon chain data.

All three consumers are updated:
- `/lighthouse/health` HTTP endpoint (BN + VC)
- Remote monitoring (`monitoring_api`)
- Prometheus `/metrics` scraping (BN + VC)

Closes sigp#6687

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@frederik12321

Copy link
Copy Markdown
Author

Closing this PR — I used AI tooling (Claude) to generate these changes and wouldn't be able to adequately explain or maintain the code myself. The fix is valid if someone wants to pick it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant